Hardware Support for Flexible Distributed Shared Memory

نویسندگان

  • Steven K. Reinhardt
  • Robert W. Pfile
  • David A. Wood
چکیده

Workstation-based parallel systems are attractive due to their low cost and competitive uniprocessor performance. However, supporting a cache-coherent global address space on these systems involves significant overheads. We examine two approaches to coping with these overheads. First, DSM-specific hardware can be added to the off-the-shelf component base to reduce overheads. Second, application-specific coherence protocols can avoid some overheads by exploiting programmer (or compiler) knowledge of an application’s communication patterns. To explore the interaction between these approaches, we simulated four designs that add DSM acceleration hardware to a collection of off-the-shelf workstation nodes. Three of the designs support user-level software coherence protocols, enabling application-specific protocol optimizations. To verify the feasibility of our hardware approach, we constructed a prototype of the simplest design. Measured speedups from the prototype match simulation results closely. We find that, even with aggressive DSM hardware support, custom protocols can provide significant speedups for some applications. In addition, the custom protocols are generally effective at reducing the impact of other overheads, including those due to less aggressive hardware support and larger network latencies. However, for three of our benchmarks, the additional hardware acceleration provided by our most aggressive design avoids the need to develop more efficient custom protocols.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploring the Value of Supporting Multiple DSM Protocols in Hardware DSM Controllers

The performance of a hardware distributed shared memory (DSM) system is largely dependent on its architect’s ability to reduce the number of remote memory misses that occur. Previous attempts to solve this problem have included measures such as supporting both the CC-NUMA and S-COMA architectures in the same machine and providing a programmable DSM controller that can emulate any DSM mechanism....

متن کامل

Command-Triggered Microcode Execution for Distributed Shared Memory Based Multi-Core Network-on-Chips

Technology advance enables integration of a lot of resources on multi-core Network-on-Chips (NoCs). In such complex system, memories are preferably distributed and supporting Distributed Shared Memory (DSM) is essential for the sake of re-using huge amount of legacy code and easy programming. Besides, the design complexity of multi-core NoCs results in long time-to-market and high cost. Motivat...

متن کامل

Operating System Support for Flexible Coherence in Distributed Shared Memory

COMMOS1 is an operating system architecture developed to support shared persistent data objects in distributed systems. This paper describes its support for flexible coherence. The approach is based on a microkernel, typed memory objects and integrated coherence control. The coherence server is clearly separated from the external pager. This separation makes it easier to provide multiple cohere...

متن کامل

Dimensions of Verifying the Hardware-Software Interface in a Shared-Memory Multiprocessor

Scalable shared-memory multiprocessors provide a flexible programming model with good performance scaling. These features, however, come at the expense of additional hardware complexity to provide a consistent view of the memory hierarchy. Verifying this aspect of a multiprocessor system is nontrivial, often requiring far more time than the actual implementation. We investigate the various appr...

متن کامل

Using Memory-Mapped Network Interfaces to Improve the Performance of Distributed Shared Memory

Shared memory is widely believed to provide an easier programming model than message passing for expressing parallel algorithms. Distributed Shared Memory (DSM) systems provide the illusion of shared memory on top of standard message passing hardware at very low implementation cost, but provide acceptable performance for only a limited class of applications. We argue that the principal sources ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Trans. Computers

دوره 47  شماره 

صفحات  -

تاریخ انتشار 1998